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Abstract 

In general, the representation of combinatorial objects is decisive for the feasibility 
c/2 , of several enumerative tasks. In this work, we show how a (unique) string represen- 

tation for (complete) initially-connected deterministic automata (ICDFA's) with n 
states over an alphabet of k symbols can be used for counting, exact enumeration, 
sampling and optimal coding, not only the set of ICDFA's but, to some extent, the 
set of regular languages. An exact generation algorithm can be used to partition the 
set of ICDFA's in order to parallelize the counting of minimal automata (and thus of 
regular languages). We present also a uniform random generator for ICDFA's that 
C*~) ' uses a table of pre-calculated values. Based on the same table it is also possible to 

. obtain an optimal coding for ICDFA's. 

o. 

Q^ , Keyword: regular languages, initially-connected deterministic finite automata, enu- 

meration, random generation 
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In general, the representation of combinatorial objects is decisive for the feasibility of 
several enumerative tasks. In this work, we show how a (unique) string representation for 
(complete) initially-connected deterministic automata (ICDFA's) with n states over an 
alphabet of k symbols can be used for counting, exact enumeration, sampling and optimal 
coding, not only the set of ICDFA's but, to some extent, the set of regular languages. 
The key fact is that string representations are characterized by a set of rules that allow 
an exact and ordered generation of all its elements. An exact generation algorithm can 
be used to partition the set of ICDFA's in order to parallelize the counting of minimal 
automata, and thus of regular languages. With the same set of rules it is possible to 
design a uniform random generator for ICDFA's that uses a table of pre-calculated values 
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(as usual in combinatorial decomposition approaches). Based on the same table it is also 
possible to obtain an optimal coding for ICDFA's (with or without final states). 

In the next section, some definitions and notation are introduced. In Section [3] we 
review the string representation of non-isomorphic ICDFA0's (i.e., ICDFA's without final 
states), and how it can be used to generate and enumerate all ICDFA's. We also relate 
those methods to the ones presented by Champarnaud and Paranthoen in [CP05], by 
giving a new enumerative result. In Section 01 we briefly describe the implementation of 
a generator algorithm for ICDFA0's. Section [5] presents the methods for parallelizing the 
counting of languages by slicing the universe of ICDFA0's and some experimental results 
are given. A uniform random generator for ICDFA0's is described in Section [6] along with 
some experimental results and statistical tests. Using the recurrence formulae defined in 
Section [6J we show in Section [7] how we can associate an integer with an ICDFA0's and 
vice- versa. Section [8] concludes with final remarks. 

2 Preliminaries 

Given two integers m < n we represent the set {i £ N | m < i < n} by [m, n]. A 
deterministic finite automaton (DFA) A is a quintuple (Q, E, 5, qo, F) where Q is a finite 
set of states, E the alphabet, i.e, a non-empty finite set of symbols, 5 : Q x E — > Q is 
the transition function, go the initial state and F C Q the set of final states. The size 
of the automaton is given by \Q\. We assume that the transition function is total, so we 
consider only complete DFA's. As we are not interested in the labels of the states, we can 
represent them by an integer i £ [0, \Q\ — 1]. The transition function 5 extends naturally 
to E*. A DFA is initially- connected^ (ICDFA) if for each state q £ Q there exists a 
string x £ E* such that 6(qo, x) = q. The structure of an automaton (Q, E, 6, qo) denotes a 
DFA without its final state information and is referred to as a DFA0. For each structure, 
there will be 2 n DFA's, if \Q\ = n. We denote by ICDFA the structure of an ICDFA. 
Two DFA's A = (Q, E, 6, go, F) and A' = (Q' , E, 6' , q' , F') are called isomorphic (by 
states) if there exists a bijection / : Q — > Q' such that /(go) = Qo and for all a £ E 
and q G Q, f(S(q,a)) = 5'(f(q),a). Furthermore, for all q £ Q, q £ F if and only if 
f(q) £ F' . The language accepted by a DFA A is L(A) = {x £ E* | 8(qo,x) £ F}. 
Two DFA are equivalent if they accept the same language. Obviously, two isomorphic 
automata are equivalent, but two non-isomorphic automata may also be equivalent. A 
DFA A is minimal if there is no DFA A' with fewer states equivalent to A. Trivially a 
minimal DFA is an ICDFA. Minimal DFA's are unique up to isomorphism. Domaratzki 
et al. |DKS02| gave some asymptotic estimates and explicit computations of the number 
of distinct languages accepted by finite automata with n states over an alphabet of k 
symbols. Given n and k, they denoted by fk{ n ) the number of pairwise non-isomorphic 
minimal DFA's and by gk(n) the number of distinct languages accepted by DFA's, where 
9k(n) = EiLl/feW- 
1 Also called accessible. 
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3 Strings for ICDFA's 



Reis et al. [RMA05] presented a unique string representation for non-isomorphic ICDFA0's. 
In this section, we briefly review this representation and how it can be used to generate 
and enumerate all ICDFAs. We also give a new enumerative result and relate this repre- 
sentation to the one presented by Champarnaud and Paranthoen in [CP05| . 

Given a complete DFA0 (Q,T,,6,qo) with \Q\ = n and |E| = k , consider a total order 
< over E. We can define a canonical order over the set of the states by exploring the 
automaton in a breadth-first way choosing at each node the outgoing edges in the order 
considered for E. If we restrict this representation to ICDFA0's, then this representation 
is unique and defines an order over the set of its states. For instance, consider the following 
ICDFA0 and consider the alphabetic order in {a, b, c}. 
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The states ordering is A,C,B,D and [1, 2, 0, 2, 3, 0, 3, 0, 2, 1, 3, 2] is its string representation. 
Formally, let E = {<7j | i G [0, k — 1]}, with o"o < o\ < ■ ■ ■ < (Tk-l- Given an ICDFA0 
(Q,E,<5, qo) with \Q\ = n, the representing string is of the form (sj)ie[o,fcn-i] with Sj G 
[0,n- 1] and Si = 5([i/k\,a imodk )- 

Let (si)j € [o : fc n _i] with Sj G [0, n — 1] be a string satisfying the following conditions: 

(Vm G [2, n — l])(Vi G [0, kn — l])(sj = m ^ (3jf G [0, i - 1]) Sj = m - 1). (Rl) 
(Vm G [l,n- G [0, A;m - 1]) sj = m. (R2) 

In [RMA05] the following theorem was proved. 

Theorem 1 There is a one-to-one mapping between (s^jgro^n-i] with Si G [0,ra — 1] 
satisfying rules \R1\ and [RH and the non-isomorphic ICDFA0 's with n states, over an 
alphabet E of size k. 

We note that this string representation can be extended to non-complete ICDFA0's, 
by representing all missing transitions with the value —1. In this case, rules EH and |R2] re- 
main valid, and we can assume that the transitions from this state are into itself. However 
for enumeration and generation purposes we do not consider non-complete ICDFA0's. 

In order to have an algorithm for the enumeration and generation of ICDFA0's, instead 
of rules IR1I and IR2I an alternative set of rules were used. For n = 1 there is only one 
(non-isomorphic) ICDFA0 for each k > 1, so we assume in the following that n > 1. In 
a string representing an ICDFA0, let (/j)je[i,n-i] be the sequence of indexes of the first 
occurrence of each state label j. For explanation purposes, we call those indexes flags. It 
is easy to see that (|Rip and (|R2|) correspond respectively to (|Gip and (|G2|) : 

W€[2,n-l])(J j >f j -i); (Gl) 
(VmeM-l])(/ ra <H. (G2) 

This means that f\ G [0, k— 1], and fj-\ < fj < kj for j G [2, n— 1]. We begin by counting 
the number of sequences of flags allowed. 



3 



Theorem 2 Given k and n, the number of sequences {fj)je[i.n-i]> Fk,n, is given by 
k-1 2fc-l fc(n-l)-l , . 

E E 1= („ n )(*^TRTT = c "* ,; 

/i=o/ 2 =/i+i /„-i=/„-2+i v / v y 
where Cn ar e the (generalised) Fuss-Catalan numbers. 

Proof 1 T/ie /irsi equality follows directly from the definition of the (/j)jg[i in -i] • F° r 
the second, note that enumerates k-ary trees with n internal nodes, (see for 

instance \SF 96l). In particular, for k = 2, C\ are exactly the Catalan numbers that count 
binary trees with n internal nodes. This sequence appears in Sloane JSlo03\/ as A 00 108 
and for k = 3 and k = 4 as A 00 1764 an d A 002293 sequences, respectively. So it suffices 
to give a bijection between these trees and the sequences of flags. Recall that a k-ary tree is 
an external node or an internal node attached to an ordered sequence of k, k-ary sub-trees. 




[2,5,8] [1,2,4] 

Figure 1: Two 3-ary trees with 4 internal nodes and the correspondent sequence of flags. 

Let be a k-ary tree and let < be a total order over E. For each internal node i ofT^ 
its outgoing edges can be ordered left-to-right and attached a unique symbol o/E according 
to <. Considering a breadth- first, left-to-right, traversal of the tree and ignoring the root 
node (that is considered the 0-th internal node), we can represent Tj^ , uniquely, by a bitmap 
where a represents an external node and a 1 represents an internal node. As the number 
of external nodes are (k — l)n + 1, the length of the bitmap is kn. Moreover the j + 1-th 
block of k bits corresponds to the children of the j-th internal node visited, for j £ [0, n — 
1]. For example, the bitmaps of the trees in Figured are [0,0,1,0,0,1,0,0,1,0,0,0] and 
[0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0], respectively. The positions of the 1 's in the bitmaps correspond 
to a sequence of flags, (/t)ie[l,n-l]j *- e -; fi corresponds to the number of nodes visited before 
the i-th internal node (excluding the root node). It is obvious that (/i)ie[i,n-i] verify [Gil 
For \G2l note that for the each internal node the outdegree of the previous internal nodes 
is k. Conversely, given a sequence of flags (/j)je[l,n-l]; we construct the bitmap such that 
bf t =l for i E [1, n — 1] and bj = for the remaining values, for j S [0, kn — 1]. As above, 
for the representation of the j + 1-th internal node, [fj/k\ gives the parent and mod k 
gives its position between its siblings (in breadth- first, left-to-right traversal). 

To generate all the ICDFA0's, for each allowed sequence of flags (/j)je[i,n-i]> an the 
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remaining symbols Sj can be generated according to the following rules: 

i<h Si = 0; (G3) 
(Vj G [l,n — 2])(/, < i < => Si G [0, j]); (G4) 

* > /n-i Si G [0,n- 1]. (G5) 

In [RMA05] a simple combinatorial argument was given to show that 

Theorem 3 The number of strings (si)ie[o,fcn-ll representing ICDFA0 's with n states 
over an alphabet of k symbols is given by 

fc-l 2k-l 3fc-l fc(n-l)-l n 

= £ E E - E n**-*- 1 " 1 ; d) 

/l=0 /2=/l+l /3=/2 + l /n-l=/n-2+l t=2 

where f n = kn. 

In Section [6] we give other recursive definition that is more adequate for tabulation. 
3.1 Analysis of the Champarnaud et al. Method 

Champarnaud and Paranthoen in [CP051 IPar04| . generalizing work of Nicaud [NicOOj 
presented a method to generate and enumerate ICDFAg's, although not giving an explicit 
and compact representation for them, as the string representation used here. An order 
< over £* is a prefix order if (Vx G £*)(V<7 G T,)x < xa. Let A be an ICDFA0 over X 
with k symbols and n states. Given a prefix order in S*, each automaton state is ordered 
according to the first word i£S* that reaches it in a simple path from the initial state. 
The sets of this words {V} are in bijection with fc-ary trees with n internal nodes, and 
therefore to the set of sequences of flags, in our representation^]. Then it is possible to 
obtain a valid ICDFAg by adding other transitions in a way that preserves the previous 
state labelling. For the generation of the sets V it is used another set of objects that are 
in bijection with A;-ary trees with n internal nodes and are called generalised tuples. The 
number of ICDFA0's is computed using recursive formulae associated with generalized 
tuples, akin the ones we present in Section [6j 

4 Generating ICDFA0's 

In this section, we present a method to generate all ICDFA0's, given k and n. We start 
with an initial string, and then consecutively iterate over all allowed strings until the last 
one is reached. The main procedure is the one that given a string returns the next legal 
one. For each k and n, the first ICDFA0 is represented by the string fc_1 10 fc_1 . . . (n— l)0 fc 
and the last is represented by 12 . . . (n — l)(n— i)( fc ^ 1 ) n + 1 . According to the rules lGl"] - IG5( 
we first generate a sequence of flags, and then, for each one, the set of strings representing 
the ICDFA0's in lexicographic order. The algorithm to generate the next sequence of 
flags is the following, where the initial sequence of flags is (hi — 1] : 

2 Indeed our order on the states induces a prefix order in E*. 
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def nextflags(i) : 

if i=l then ft = U - 1 
else 

if = ji-x) then 

/i = fe * i - 1 

nextfiags(i — 1) 
else fi = fi - 1 

To generate a new sequence, we must call nextflags(n-l). Given the rules lGl1 and lG2l the 
correctness of the algorithm is easily proved. When a new sequence of flags is generated, 
the first ICDFA0 is represented by a string with Os in all other positions (i.e., the lower 
bounds in rules IG3HG5]) . The following strings, with the same sequence of flags, are 
computed lexicographically using the procedure nexticdfa, called with a = n — 1 and 
b = k-1: 

def nexticdfa(a, b) : 

i = a * k + b 

if a < n — 1 then 

while i e (fj)je[l,n-i] : 

for k = i + 1 to kn — 1 : 

if k i. ifj)je[i, n -i] then s k = 
6 = 6-1 
i = i — 1 

fj = the nearest flag not exceeding i 
if Si == Sf. then 

s t = 

if b == then nexticdfa(a — 1, k — 1) 
else nexticdfa(a, b — 1) 

else Sj = Si + 1 

Note that the last string for each sequence of flags has the value s; = j for I G 
[/j + — 1], with j £ [l,re — 1]. The time complexity of the generator is linear in 

the number of automata. As an example, for k = 2 and n = 9 it took about 12 hours to 
generate all the 705068085303 ICDFA 's, using a AMD Athlon at 2.5GHz. Finally, for the 
generation of ICDFA's we only need to add to the string representation of an ICDFA0, 
a string of n 0's and l's, correspondent to one of the 2 n possible choices of final states. 

5 Counting Regular Languages (in Slices) 

To obtain the number of languages accepted by DFA's with n states over an alphabet of k 
symbols, we can generate all ICDFA's, determine which of them are minimal (fk(n)) and 
calculate the value of <?fc(n). Obviously, this is in general an intractable procedure. But 
for small values of n and k some experiments can take place. We must have an efficient 
implementation of a minimization algorithm, not because of the size of each automaton 
but because the number of automata we need to cope with. For that we implemented 
Hopcroft's minimization algorithm |Hop71|, using efficient set representations. For very 
small values of n and k (n + k < 16) we represented sets as bitmaps and for larger values, 
AVL trees [Avlj were used. 

The problem can be parallelized providing that the space search can be safely par- 
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ICDFA 


ICDFA 


Minimal {fk( n )) 


Minimal % 


Time (s) 


fc = 2 


2 


12 


48 


24 


50% 







3 


216 


1728 


1028 


59% 


0.018 




4 


5248 


83968 


56014 


66% 


0.99 




5 


160675 


5141600 


3705306 


72% 


79.12 




6 


5931540 


379618560 


286717796 


75% 


8700 




7 


256182290 


32791333120 


25493886852 


77% 


1237313 


fc = 3 


2 


56 


224 


112 


50% 


0.002 




3 


7965 


63720 


41928 


65% 


0.7 




4 


2128064 


34049024 


26617614 


78% 


494.72 




5 


914929500 


29277744000 


25184560134 


86% 


652703 


k = 4 


2 


240 


960 


480 


50% 


0.01 




3 


243000 


1944000 


1352732 


69% 


23.5 




4 


642959360 


10287349760 


7756763336 


75% 


184808 


fc = 5 


2 


992 


3968 


1984 


50% 


0.041 




3 


6903873 


55230984 


36818904 


66% 


756.2 



Table 1: Performance and number of minimal automata. 



titioned. Using the method presented in Section [H we can easily generate slices of 
ICDFAjj's and feed them to the minimization algorithm. A slice is a sequence of ICDFA0's 
and is defined by a pair (start, last), where start is the first automaton in the sequence 
and last is the last one. If we have a set of CPUs available, each one can receive a slice, 
generate all ICDFA0's (in that slice), generate all the necessary ICDFA's and feed them 
to the minimization algorithm. For the generation of ICDFA's, we used the observation 
by Domaratzki et al. [DKS02] . that is enough to test 2 n ~ 1 sets of final states, using the 
fact that a DFA is minimal iff its complementary automaton is minimal too. In this way, 
we can safely divide the search space and distribute each slice to a different CPU. Note 
that this approach relies in the assumption that we have a much more efficient way to 
partition the search space than to actually perform the search (in this case a minimization 
algorithm). The task of creating the slices can be taken by a central process that succes- 
sively generates the next slice and at the end assembles all the results. The server can 
run interactively with its slaves, or it can generate all the slices at once to be used later. 
The server generates a slice using the generator algorithm presented in Section 01 For this 
experiment we used two approaches. We developed a simple slave management system - 
called Hydra — based on Python threads, that was composed by a server and a variable 
set of slaves. In this case, the slaves can be any computeiH. For each slice a process was 
executed via ssh, and the result was returned to the server. Another approach was to use 
a computer grid, in particular 24 AMD Opteron 250 2.4GHz (dual core). 

5.1 Experimental results 

In Table [TJ we summarise some experimental results. Most of the values for k = 2 and 
k = 3, were already given by Domaratzki et al. in [DKS02] and the new results are in bold 
in the table. For k = 2, n = 8 we have divided the universe of ICDFAg's in 254 slices and 
the estimated CPU time for each one to be processed is 11 days. 

Moreover, the slicing process can give new insights about the distribution of minimal 
automata. Figure [2] presents two examples of the values obtained for the rate of minimal 
DFA's. For n = 7 and k = 2 we give the percentage of minimal automata for each of the 

3 We used all the normal desktop computers of our colleagues in the CS Department. 
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257 slices we had used to divide the search space (32791333120 ICDFA0's). Each slice 
had about 100000 ICDFA 's, and so 128000000 ICDFA's, and it took about 78 minutes 
to conclude the process. The whole set of automata was processed in 12 hours of real time 
of a CPU grid, that corresponds to 344 hours of CPU time. 




Figure 2: Rate of minimal DFA's with (k = 3,n = 5) for 915 slices and with (k = 2,n = 7) 
for 257 slices. 



6 A Uniform Random Generator 

The ICDFA0 representation presented (Section [3|) permits an easy random generation for 
ICDFAs, and thus for DFAs. To randomly generate a DFA for a given n and k, it is 
only necessary to: (i) randomly generate a valid sequence of flags (/j)ie[i,n-i] according to 
IGll and IG2| (ii) followed by the random generation of the rest of the nk elements of the 
string following IG3HG51 rules; (iii) and finally the random generation of the set of final 
states. The uniformity issue for steps (ii) and (iii) is quite straightforward. For step (iii) 
it is just necessary to use a uniform random integer generator for a value i G [0,2™]. It 
is enough, for step (ii) the repeated use of the same number generator for values in the 
range [0, i] for < i < n according to rules IG3HG51 Step (i) is the only step that needs 
special care. Consider the case n = 5 and k = 2. Because of rule IRll flag f\ can only be 
on positions or 1. But there are 140450 ICDFA0's with f\ in the first case and only 
20225 in the second. Thus the random generation of flags, to be uniform, must take this 
into account by making the first case more probable than the second. We can generate a 
random ICDFA0 generating its representing string from left to right. Supposing that flag 
f m -i is already placed at position i and all the symbols to its left are generated, i.e., the 
prefix sqSi • • • s« is already defined, then the process can be described by: 

mk — 1 

r = random(l, J2 N ™,j) 
j=i+i 

for j = i + 1 to mk — 1 : 

E N mi i,j2N m ,i 

l—i l—i 



if re 



then return i 
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where random(a,b) is an uniform random generated integer between a and b, and N m j 
is the number of ICDFAgs with prefix sqSi ■ ■ ■ S{ with the first occurrence of symbol m 
in position j, making N mt i = to simplify the expressions. The values for N m j could 
be obtained from expressions similar to Equation ([1]), and used in a program. But the 
program would have a exponential time complexity. By expressing N m j in a recursive 
form, we have, given k and n 

2V„_ij = n nk ~ x ~i with j e [n - 2, (n - l)k - 1]; 

(m+l)k-j-2 

N md = £ {m + l) l N m+w+1 with me [l,n-2], (2) 

i=0 

j E [m — 1, mk — 1]. 

This evidences the fact that we keep repeating the same computations with very small 
variations, and thus, if we use some kind of tabulation of this values (N m j), with the 
obvious price of memory space, we can create a version of a uniform random generator, that 
apart of a constant overhead used for tabulation of the function refered, has a complexity 
of 0(n 3 /c)0(random). The algorithm is described by the following: 

for i = (n — l)k — 1 downto n — 2 : 

for m = n — 2 downto 1: 

fc-i 

N m . m k+1 = ( m + l) ! ^m+l,mfc+i 
i=0 

for i = mk — 2 downto m — 1: 

N m ,i = (m + l)N mA+ i + N m+1A+ i 

9 = -1 

for i = 1 to n — 1 : 

/ = generatefiag(i, g + 1) 
for j = ,g+ 1 to /- 1: 

print random(0, i - 1) 
print i 
9 = 1 

This means that with the same AMD Athlon 64 at 2.5GHz, using a C implementation 
with libgmp [GMP] the times reported in Table [2] were observed. It is possible, without 





fc = 2 


fc = 3 


fc = 5 


fc = 10 


fc = 15 


n = 10 


0.10s 


0.16s 


0.29s 


0.61s 


1.30s 


n = 20 


0.31s 


0.49s 


1.26s 


4.90s 


12.24s 


n = 30 


0.54s 


1.37s 


3.19s 


19.91s 


62.12 


n = 50 


1.61s 


3.86s 


17.58s 


2.22m 


947.71s 


n = 75 


3.96s 


12.98s 


76.69s 


700.20s 


2459.34s 


n = 100 


7.92s 


36.33s 


215.32s 


2219.04s 


8091.30s 



Table 2: Times for the random generation of 10000 automata. 

unreasonable amounts of RAM to generate random automata for unusually large values of 
n and k. For example, with n = 1000 and k = 2 the memory necessary is less than 450MB. 
The amount of memory used is so large not only because of the amount of tabulated values, 
but because the size of the values is enormous. To understand that, it is enough to note 
that the total number of ICDFA0's for these values of n and k is greater than 10 3350 , and 
the values tabulated are only bounded by this number. 



def generateflag(rn, I) : 

mk — l 

r = random ( , ^ m l ~ l N mA ) 

i=l 

for i = I to mk — 1 : 
if r < m l ~ l N m .i 
then return i 
else r = r — m l ~ l N m i 
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6.1 Statistical test of the random generator 



Although the method used to generate random automata is, by its own construction, 
uniform, we used x 2 test to evaluate the random generation quality. The universe of 
ICDFA0's with 6 states and 2 symbols has a total size of 5931540. This size is large 
enough for a test with some significance and it is still reasonable, both in time and space, 
to perform the test. We generated three different sets of 3000000 ICDFA0's and perform 
the test in each one. Because of the size of the data, we could not find any tabulated 
values for acceptance, and thus the following formula was used with v = 30000000 — 1 and 
x p being the significance level (1% in this case): 

, 3 2 2 

v + 2^/vXp + -x p - -. 

The size of the data sets and the repetition of the test for three times, is the recommended 
procedure by Knuth ([Knu81], pages 35-39). For the three experiments the values obtained 
were, respectively, 5933268.92456, 5925676.75108 and 5935733.28172, that are all smaller 
than the acceptance limit, that for this case was 5938980.75468. 



7 Enumeration of ICDFA0's 

In this section, we show how, given a string representation of an ICDFA0's of size n over 
an alphabet of k symbols, we can compute its number in the generation order (described in 
Section H|) and vice- versa, i.e., given a number less than -Bfc jn , we obtain the corresponding 
ICDFAg. This provides an optimal encoding for ICDFAg's, as defined by M. Lothaire in 
|Lot05] . Chapter 9. This bijection is accomplished using the tables defined in Section [6] 
that correspond to partial sums of Equation (pQ). 

fc-1 

Theorem 4 B k>n = £ N^. 

1=0 

Proof 2 The result follows easily by expanding N m j using Equations (flj) and Equa- 
tion (C]). 



7.1 From ICDFAg's to Integers 

Let (si)ie[o,fcn-i] be an ICDFA0's string representation, and let l] De the corre- 
sponding sequence of flags. From the sequence of flags we obtain the following number, 

n f , 

n—l ik—1 i—1 

n f = J2 {i j ~ fi N itj (]J(m fm + 1 - fm - 1 )) (3) 

i=l j=fi+l m=l 

which is the number of the first ICDFA0 with flags (fj)je[i,n—l]' Now we must add the 
information provided by the rest of the elements of the string (si)i & [o,kn-l] : 

n-1 / fj+i-1 I n-1 \ \ 

E E + If (rn+iy^-f™- 1 )] (4) 



j=l \l=fj+l \m=j+l 

And the corresponding number is n s = nt + n r . 
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7.2 From Integers to ICDFA 's 



Given an integer < m < Bk^ n a string representing uniquely an ICDFA0 can be obtained 
using a method inverse of the one in the last section. The flags (/j)je[i,n-il are generated 
from right-to- left, by successive subtractions. The rest of the string (si)jg[o jfc^-i] is generate 
considering the remainders of integer divisions. The algorithms are the following: 
1 



s 

for 



1 

3 

p = v 
while 



1 to 

i * k — 

3—fi-l 



J 



1 and m > p * s * Ni. 



s ■■ 
fi 



S * V 

= 3 



m = m — 

j = 3 ~ 1 

p = p/i 
j-U-i-i 



Ni a * p * s 



i — k * n — 1 
j = n - 1 

while m > and j > ; 

while m > and i > fj : 



Si 

m 

i = 



= m mod [j - 
= m (j + 1) 
= i- 1 
1 



1) 



8 Final Remarks 

The methods here presented were implemented and tested to obtain both exact and ap- 
proximate values for the density of minimal automata. Champarnaud et al. in [CP05j. 
checked a conjecture of Nicaud that for k = 2 the number of minimal ICDFA's is about 
80% of the total, by sampling automata with 100 states (for all possible number of final 
states). Our results also corroborate that conjecture, being the exact values for some 
small values of n and samples for greater values. In particular, for k = 2 and n = 100 
we obtained the same results as Champarnaud et al. It seems that for k > 2 almost all 
ICDFA's are minimal. For k = 3, 5 and n = 100 that was also checked by Champarnaud 
et al.. For a confidence interval of 99% and significance level of 1% the following table 
presents the percentages of minimal ICDFA's for several values of k and n, and each 
possible number of final states. 



k\n 


5 


6 


7 


8 


9 


10 


20 


40 


80 


160 


3 


85.8% 


90.8% 


93.3% 


95.0% 


96.1% 


96.7% 


98.7% 


99.4% 


99.7% 


99.8% 


5 


93.0% 


96.5% 


98.2% 


99.1% 


99.5% 


99.8% 


100.0% 


100.0% 


100.0% 


100.0% 


7 


93.7% 


96.8% 


98.4% 


99.2% 


99.6% 


99.8% 


100.0% 


100.0% 


100.0% 




9 


93.7% 


96.9% 


98.4% 


99.2% 


99.6% 


99.8% 


100.0% 


100.0% 






11 


93.8% 


96.9% 


98.4% 


99.2% 


99.6% 


99.8% 


100.0% 


100.0% 






13 


93.7% 


96.9% 


98.4% 


99.2% 


99.6% 


99.8% 


100.0% 


100.0% 







A web interface to the random generator can be found in the FAdo project web 
page [Fad] . Bassino and Nicaud in [BN] presented also a random generator of ICDFA's 
based on Boltzmann Samplers, recently introduced by Duchon et al. [DFLS04J. However 
the sampler is uniform for partitions of a set with kn elements into n nonempty subsets 
(not for the universe of automata) . These partitions are related with string representations 
that verify only rule IR1I Based on the work here presented, it would be interesting to 
study a better approximation, that would satisfy rule lR2[ 
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